DUTH Does Probabilities of Relevance at the Legal Track

نویسندگان

  • Dim P. Papadopoulos
  • Vicky S. Kalogeiton
  • Avi Arampatzis
چکیده

We participated in the Learning Task of the TREC 2010 Legal Track, focusing solely on estimating probabilities of relevance. We submitted three automated runs based on the same tf.idf ranking, produced by the topic narratives and positive-only feedback of the training data in equal contributions. The runs differ in the way the probabilities of relevance are estimated: (1) DUTHsdtA employed the Truncated Normal-Exponential model to turn scores to probabilities. (2) DUTHsdeA did not assume any specific component score distributions but estimated those on the scores of training data via Kernel Density Estimation (KDE) methods. (3) DUTHlrgA used Logistic Regression with the co-efficients estimated on the scores of training data. We found that DUTHsdeA and DUTHlrgA are greatly affected by biases in the training set, since they assume that input score data are uniformly sampled. Also, KDE was found to be very sensitive to its parameters, influencing greatly the probability estimates. In these respects, DUTHsdtA was proven to be the most robust method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Report on Thomson Legal and Regulatory Experiments at CLEF-2004

Thomson Legal and Regulatory participated in the CLEF-2004 monolingual and bilingual tracks. Monolingual experiments included Portuguese, Russian and Finnish. We investigated a new query structure to handle Finnish compounds. Our main focus was bilingual search from German to French. Our approach used query translation and post-translation pseudo-relevance feedback. We compared two translation ...

متن کامل

Overview of the TREC 2010 Legal Track Notebook Draft 2010 . 10 . 25

The TREC 2010 Legal Track consisted of two distinct tasks: the learning task, in which participants were required to estimate the probability of relevance for each document, and the interactive task, in which participants were required to identify all relevant documents using a human-in-the-loop process. 2010 is the fth year of the legal track, the third year of the interactive task within the ...

متن کامل

PRIS at TREC 2011 Legal Track Discovery Based on Relevant Feedback

In order to finish the task of TREC 2011 Legal Track, this paper puts forward an experiment method, which combines indri and relevant feedback to evaluate the probability of relevance of every document in a collection.

متن کامل

Another View of the Classical Problem of Comparing Two Probabilities

The usual calculation of the P-value for the classical problem of‎ ‎comparing probabilities is not always accurate‎. ‎This‎ ‎issue arose in the context of a legal dispute which depended on when‎ ‎some written material was written in a diary‎. ‎The problem raises‎ ‎some issues on the foundations of statistical inference.‎ 

متن کامل

Cluster-Based Relevance Feedback: Legal Track 2011

This is our second participation in the TREC Legal Track. The TREC Legal Track 2011 featured only the Learning Task. We participated in Topics 401 and 403. We used Lemur 4.11 for Boolean retrieval and followed it with a clustering technique, where we chose members from each cluster (which we called seeds) for relevance judgement by the TA and assumed all other members of the cluster whose seeds...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010